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DIFFICULTIES WITH RECOVERING THE MASSES OF SUPERMASSIVE BLACK HOLES FROM STELLAR 



We investigate the ability of three-integral, axisymmetric, orbit-based modeling algorithms to recover 
the parameters defining the gravitational potential (mass-to-light ratio T and black hole mass M.) 
in spheroidal stellar systems using stellar kinematical data. We show that the potential estimation 
problem is generically under-determined when applied to long-slit kinematical data of the kind used 
for most black hole mass determinations to date. A range of parameters (T, M m ) can provide equally 
good fits to the data, making it impossible to assign best-fit values. The indeterminacy arises from 
the large variety of orbital solutions that are consistent with a given mass model. We demonstrate 
the indeterminacy using a variety of data sets derived from realistic models as well as published 
observations of the galaxy M32. The indeterminacy becomes apparent only when a sufficiently large 
number of distinct orbits are supplied to the modeling algorithm; if too few orbits are used, spurious 
minima appear in the ^{T , M,) contours, and these minima do not necessarily coincide with the 
parameters defining the gravitational potential. 

We show that the range of degeneracy in M. depends on the degree to which the data resolve the radius 
of influence of the black hole. For FWHM/2r^ > 0.5, where FWHM refers to the instrumental reso- 
lution, we find that only very weak constraints can be placed on M,. In the case of M32, our reanalysis 
demonstrates that when a large orbit library is used, data published prior to 2000 (FWHM/2? - /, « 0.25) 
are equally consistent with black hole masses in the range 1.5 x 10 6 .Mo < M, < 5 x 10 6 .Mo, with no 
preferred value in that range. Exactly the same data can reproduce previous published results with 
smaller orbit libraries. While the HST/STIS data for this galaxy (FWHM/2r^ w 0.06) may overcome 
the degeneracy in M., HST data for most galaxies do not resolve the black hole's sphere of influence 
and in these galaxies the degree of degeneracy allowed by the data may be greater than previously 
believed. 

We investigate the effect of regularization, or smoothness constraints, on the degree of degeneracy of 
the solutions. Enforcing smoothness reduces the range of acceptable models, but we find no indication 
that the true potential can be recovered simply by enforcing smoothing. For a given smoothing level, 
all solutions in the minimum-^ 2 valley exhibit similar levels of noise; as the smoothing is increased, 
there is a systematic shift in the midpoint of the x 2 valley, until at a high level of smoothing the solution 
is biased with respect to the true solution. These experiments suggest both that the indeterminacy 
is real - i.e., that it is not an artifact associated with non-smooth solutions - and that there is no 
obvious way to choose the smoothing parameter to ensure that the correct solution is selected. 

Subject headings: galaxies: elliptical and lenticular — galaxies: structure — galaxies: nuclei — stellar 
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1. INTRODUCTION 



Supermassive black holes (SBHs) are believed to be 
the central engines of active galactic nuclei and quasars 
l)Lvnden-Bell 19 69). A substantial fraction of the mass 
involved in the energy production is expected to col- 
lapse onto the central black hole. There is now ir- 
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refutable dyn amical evidence for a SBH at the ce nter 
of our Galaxy llGenzel et al. 19971 



Glicz et al. 



and 

in NGC 4258 (jMivoshi et al. 1995|) . In addition there is 



compelling evidence that compact mass concentrations 
probably SBHs - exist in the nuclei o f a handful of other 
galaxies. The STI S GTO program ^Joseph et al. 20011: 
iBower et al. 2001|) ; Merritt et al. 2001), and several 
HST GO projects jSarzi et al. 200"H iBarth et al. 20011: 
IHuehes et al. 200lHGebhardt et al. 2003|) have begun to 
extend the search for SBHs to a sample of roughly a hun- 
dred galaxies. 

Before this search was fully underway, a tight empirical 
correlation was discovered between the masses of SBHs 
and the velocities of stars in their host bulges. The M, — 
a relation: 



M. = (1.48 ± 0.24) x 10 8 .M ( 



200km s" 
a = 4.65 ±0.4 



(1) 



ijFerrarese and Merritt 200(1 iGebhardt et al. 2000rjj) . 
relates M. to a measure of the stellar velocity disper- 
sion in a region larger than the region directly influ- 
enced by the SBH, r/,, = GM,/a 2 (slope and normal- 
ization taken from {Merritt fe; Ferrarese 2001b]) ). The 
tightness of the relation depends crucially on the sam- 
ple used to define it: SBHs whose masses were derived 
from data that resolve define a relation with negli- 
gible scatter, \ 2 < 1, while including all published de- 
tections regardless of their quality yields a weaker rela- 
tion and a different slope l lFerrarese and Merritt 200(1 
iMerritt fc Ferrarese 2001 at IMerritt fc Ferrarese 2001rjj) . 
Almost all SBH masse s derived from groun d-based, 
stellar-kinematical data {Magorrian et al. 19981) scatter 
above the M. — a relation defined by the more secure 
masses. 

If the current normalization of the M, — a relation, 
equation J5J), is correct, SBHs in the more distant of the 
Magorrian et al. (1998) galaxies are too small for their 
radii of influence to have been resolved by existing tele- 
scopes. Modeling of such data is prone to systematic er- 
rors, the sign and magnitude of which depend on the form 
of the dynamical model fit to the stellar motions and the 
degree of undcr-resolution. van der Marel (1999) argued 
that the two-integral (21) axisymmetric models used by 
Magorrian et al. (1998) were likely to give spuriously 
large values of M,. 

The discovery that the ground-based mass estimates 
were systematically high resolved the discrepancies be- 
tween the mean SBH mass inferred from quasar statis- 
tics and reverberation mapping of (mostly) distant galax- 
ies, on the one hand, a nd from the kinematic s of nearby 
galaxies on the other iRichstone et al. ~9 98). All tech- 
niques now yield a mean ratio of SBH mass to bulge 
mass of ~ 10 -3 and a mean SBH mass density of ~ 
3 x 10 5 7W Q pc- 3 {Ferrarese 2002tlTremaine et al. 2 002). 

The biases associated with 21 modeling can in prin- 
ciple be removed if the data are compared with 
fully general, "three integral" (31) axisymmetric mod- 
els, in which any distribution of orbits is allowed 
1 . Such models have been used to estimate M, 

1 We adopt the standard name for these algorithms even though 



in a number of galaxies (l | van der Marel et al. 1 998) ; 
{Cretto n van den Bo sch 19991) : (Emsellem et a l. 
1999): llGebhardt et al. 2000a|) : {Gephardt et al. 2 003) ; 
([Cappellari et al. 2003a|) : 

jVerolme et al. 2002J) ) In contrast to 21 models, 31 
models can precisely reproduce a given mass distribution 
with many different orbital populations. This extra free- 
dom is so great that one does not necessarily expect to 
find a unique potential that yields a best fit to the data; 
indeed there may exist many choices for the parameters 
(T,M.) that reproduce the data equally well. This in- 
determinacy of potential estimation is well documented 
lIMerritt 1 993aUOerha,rd et al. 19981). 

In this paper, we discuss the importance of the degen- 
eracy in the context of stellar-dynamical estimates of M, 
in galactic nuclei. We apply a state-of-the-art 31 model- 
ing algorithm to various data sets, including previously- 
analyzed data from M32, as well as simulated data gen- 
erated from an axisymmetric model of M32. We investi- 
gate how accurately a 31 modeling algorithm can recover 
the true values of M, and T, and how sensitively the es- 
timates of those quantities, and their errors, depend on 
the quality of the data, the character of the data, and the 
number of orbits included in the model, and the degree 
of smoothing applied. 

Our conclusion is that the indeterminacy problem is of- 
ten severe. Even when modeling "good" data, the range 
of values of M, that can reproduce the data equally well 
is typically very large. (We define "good" data as data 
that resolve the SBH's sphere of influence; extend far 
enough in radius to constrain the global mass-to-light 
ratio; include high-order moments of the line-of-sight ve- 
locity distributions; and have small errors.) This de- 
generacy can formally be reduced by placing restrictions 
on the allowed functional form of the stellar distribu- 
tion function; indeed it was in just this way that Magor- 
rian et al. (1998) achieved their fits, by restricting / 
to a two-integral form. Another such restriction, com- 
mon in the more recent studies, is to force the 31 f to 
be smooth llCretton et al 19991: IGebhardt et al. 20031: 
iCappellari et al. 2003al iVerolme et al. 2002|) . Smooth- 
ness constraints might reasonably be expected to 
guide the optimization routine away from solu- 
tions that vary strongly between the data points, 
achieving good fits only by virtue of the dis- 
crete character of the data {Merritt fc Fridman 19961: 
Ualali fc de Zeeuw 2002j) . However we find no indication 
that smoothness on its own can overcome the inherent 
degeneracy in the potential estimation problem. Further- 
more, if used carelessly, smoothness constraints can bias 
the solution, yielding an apparent best-fit value for M, 
which lies far from the true value. 

In § 2 we give a detailed description of our 31 model- 
ing algorithm. § 3 reviews the reasons why the potential 
estimation problem is expected to be under-determined 
in the axisymmetric geometry. § 4 presents a 21 model 
of M32 that we use as a test case for our algorithm. § 5 
and § 6 present detailed results of fits of simulated data 
sets derived from the M32 model, and § 7 describes the 
results of a re-analysis of published data for this galaxy. 

orbits in axisymmetric potentials are sometimes characterized by 
fewer than three integrals. 
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§ 8 describes how the introduction of additional regular- 
ization or smoothness constraints affects the results when 
applied to the simulated M32 data. § 9 is a discussion of 
the implications of our results for the recovery of M. in 
nearby galaxies, and § 10 sums up. 



2. MODELING ALGORITHM 

2.1. Density and Potential 

We construct dynamical models of axisymmetric stellar 
systems with mass density p(w, z) and potential z); 
z = defines the equatorial plane. The mass density may 
contain contributions from stars, p*, as well as other com- 
ponents such as dark matter or a central black hole. The 
contribution to the mass density from the stars is derived 
from the luminosity density j» (zu, z) via the mass-to-light 
ratio T, p*(m, z) = Yj»(-n7, z). In this paper (as in most 
previous studies), T will be considered a constant, al- 
though in general, a spatially-dependent T could be used 
to represent the contribution of a dark halo or a radially- 
varying stellar mass-to-light ratio. 

Obtaining from the observed surface brightness 
profile is an under-determined problem for axisymmet- 
ric galaxies exce pt when the symmetry axis lies in the 
plane of the sky llGerhard fc Binnev 199ft iRvbicki 19871 
Romanowskv & Kochanck 1997). But galaxies in which 
the mass is stratified on similar concentric ellipsoids do 
have unique deprojections provided the inclination an- 
gle i is known. In general we would obtain p{w, z) via 
a non-parametric deprojection of the observed surface 
brightness profile (Merritt & Tremblay 1994, Merritt et 
al. 1997). In what follows, the focus is on the indetermi- 
nacy resulting from incomplete kinematical information 
and we will assume the freedom to specify a unique func- 
tional form for p*(za, z). 

The gravitational potential is assumed to be of the 
form 

z) = $.(tJ7, z) - GM./(^m 2 + z 2 ), (2) 

where $>*(vj, z) is the potential derived from the stel- 
lar luminosity profile and the second term is the contri- 
bution from a central black hole. An efficient way to 
evaluate §*{w, z) is via a truncate d multipole expansion 
Ijvan Albada fc van Gorkom 1977^1 : 

$» (r, 9) = -2vrG ]T P l (cos 0) x (3) 

Jo P l ^ al+2da + rl J ^( a )^T > 

Expressions for the forces in cylindrical coordinates are 
easily derived from equation 10} . The density distribu- 
tion is required on a grid in (r, 9) . Since all real elliptical 
galaxies have moderate to steep central density cusps, 
the radial grid is chosen to be logarithmically spaced. 
The potential between grid points is evaluated by bicu- 
bic spline interpolation. 

It proved convenient to choose an analytic form for 
the luminosity density. Since the simulated data de- 
scribed below were generated from a 21 model of M32 



(see § 3), we adopted t he parametrized form of th e lumi- 
nosity density used by l)van der Marel et al. 1998H (here- 
after vdM98) in the construction of this model: 

Mw,z) = j,(m) = jo(m/b) a [l + (m/b) 2 f[l + (m/c) 2 r, 

(4) 

where m 2 = m 2 + (z/q) 2 , and a = -1.435, (3 = -0.423, 
7 = -1.298, b = 0.55", c = 102.0", q = Q.73(i = 90°). 

The potential due to the density distribution can 
be derived directly via Poisson's equation and the forces 
via numerical quadrature. We tested the accuracy of 
the multipole expansion scheme by comparing the force 
evaluations with those obtained via quadrature. We took 
Imax — 6 and set the inner radius of the grid at 6 x 10 -4 ". 
80 radial grid points and 8 polar grid points were se- 
lected for the multipole expansion. These tests showed 
that the multipole expansion gives forces that have frac- 
tional errors of ~ 10~ 3 at the innermost radius, dropping 
rapidly with increasing radius. The use of the multipole 
expansion scheme results in an approximately eightfold 
reduction in orbit integration times compared with force 
evaluation via quadrature. 

2.2. The Orbit Library 

All orbits in a steady-state axisymmetric Hamilto- 
nian respect at least two isolating integrals of the mo- 
tion: the orbital energy E and the angular momen- 
tum L z about the symmetry axis. A non-resonant or- 
bit with only these two integrals would completely fill 
the region of the meridional plane enclosed by the zero- 
yelocity curve (ZV C ). However num e rical studies e.g . 
l|Contopoulos 19601: lOllongren 1961 iRichstone 1982ft 
show that most orbits also conserve a third integral, 
I3, which confines the orbit to a subset of the allowed 
meridional-plane region. When the third integral is 
present, the orbit touches the ZVC at a finite number of 
points. Launching orbits from uniformly-spaced points 
on the ZVC ensures a reasonable sampling of the third 
dimension of phase space accessible to regular orbits. 

Each orbit is integrated for a fixed number of peri- 
ods and its properties stored. The number and nature 
of stored properties is determined by the available data. 
Since the purpose of generating the orbit library is to de- 
termine the linear combination of orbits that best repro- 
duces the data, we need to "observe" each orbit under 
conditions as close as possible to the conditions under 
which the data were taken. This involves convolving the 
intrinsic orbital properties with the seeing function, as 
well as averaging over the observed slit width and aper- 
ture size. The result is a set of quantities associated with 
the orbits that can be linearly co-added and compared 
with the data, without any need for interpolation. In the 
remainder of this section we describe the various steps in 
the generation of the orbit library. 

Orbital Initial Conditions 

Our choice of orbital initial conditions is similar to that 
of vdM98 and Cretton et al. (1999). We first select a ra- 
dial grid of Ne points logarithmically spaced from cc7 m i n 
to G7 max ; for the mass model of equation we took 
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w m ; n = 5 x 10~ 4 "and tu max = 7.5 x 10 3 ". At each ra- 
dial grid point Wi , the energy of the circular orbit of ra- 
dius Wi is Ei = {l/2vji)d<& / dwi + §{mi, 0), thus defining 
the energy grid. The maximum allowed angular momen- 
tum at energy Ei, L z max , is determined by the angular 
momentum of a circular orbit. At each energy we choose 
Nj regularly-spaced values in L z on the open interval (0, 
-^max) (i- e - excluding L* = and Lj = L l max , which cor- 
respond to radial and circular orbits respectively). This 
grid only selects orbits with one sense of rotation about 
the symmetry axis, but orbits with the opposite sense 
of rotation are trivially obtained by flipping the sign of 
the velocity. For each pair (Ei,L l j) we then compute 
the ZVC, the curve on the meridional plane where the 
effective potential is zero: 

$ cff = $(n7,z) + i^|=0. 

2 vj z 

The third quantity chosen to define an orbit is the angle 
(3 between the major axis (x) and the line joining the 
origin and a point on the ZVC. We select Np equally- 
spaced angles (3 in the open interval (0,7r). In the tests 
described below, we computed for each mass model a 
library with (N E ,N L ,Np) = (62,9,8) for a total of ~ 
4464 orbits having one sign of rotation, or 8928 orbits 
overall. 

Orbits were integrated in the meridional plane using 
an explicit Runge-Kutta integrator of order 8(5,3) due 
to (Hairer & Wanner 1993) with adaptive step size con- 
trol but which give dense output at equally space time 
intervals. The integration interval (A per i 0( j) was taken to 
be 200 periods of the circular orbit at each energy and 
orbits were sampled at N stcp = 100 equally-spaced time 
steps during each orbital period. Orbits were launched 
from the ZVC with initial velocities — v z — 0. At the 
end of the integration the energy of the orbit was always 
conserved to a (relative accuracy) of better than 1 x 10~ 5 . 
While integrations were carried out in the meridional 
plane, we require the orbit in Cartesian coordinates in 
order to compare with the observed data. Cartesian co- 
ordinates (x,y,z,v x ,v y ,v z ) were computed by assuming 
a random azimuthal angle < < 27r at each time 
step and v^jt) = L % Jw{t). Unlike othe r authors e.g. 
ijCretton et al. 1999liVerolme et al. 2002!) we do not see 
the need to "dither" the orbits to create packets of or- 
bits. Also, unlike these authors we compute the forces 
precisely (from the multipolc expansion routine) at each 
point in the orbit rather than interpolating from forces 
stored on a grid in (w, z). Once the orbit is integrated in 
the potential the observed properties of the orbit need to 
be transformed to the correct viewing angle based on the 
assumed inclination i of the model; this gives an addi- 
tional set of coordinates {x' , y' , z' , v' x , v' y , v' z ), with x' and 
z' coinciding with the projected major and minor axes 
respectively and v' y , the observed line-of-sight velocity. 

The Storage Grids 

The orbital properties are stored on three kinds of 
grid, depending on the type of observational constraint. 
These sto rage grids are si mil ar to those used by other 
authors iRix et al. 1997t Ivan der Marel et al. 1991 
iCretton et al. 1999HVerolme et al. 2002f) . 



To reproduce the known mass distribution of the model 
(self-consistency constraints), we store the orbital con- 
tribution to the mass of each cell on a grid in the (r, 9)- 
plane. We use 20 logarithmically-spaced radial bins and 
16 equally-spaced bins in 9 (0 < 9 < tt/2). For the M32 
mass model described above, the lower and upper radial 
grid points were at ~ 5 x 10~ 4 "and 102". At each time 
step the orbital position (m, \z\) determines the cell to 
which a fractional weight <5 is added. The total mass 
contributed by the ath orbit to the grid cell centered on 
(r, 9) is a sum of all the fractional weights and is repre- 
sented by m" e . 

The orbital kinematics are stored on 3-D grids in the 
projected coordinates (x',z',v' y ). Each set of observa- 
tions (defined by different seeing, aperture locations etc.) 
requires a separate grid. The grids themselves are square 
in the x' — z'-plane with outer boundaries set by the out- 
ermost observed aperture. For the models in this paper 
the typical grid consisted of 267 x 267 pixels with the 
bin width equal to ~ 1/8 the FWHM of the PSF (or 
seeing in the case of ground based data). So for instance 
for all data from the HST (FOS and STIS) the orbit 
libraries were stored on grids with pixel width 0.0125" 
whereas for ground based CFHT data (e.g. Bender et 
al. 1996) the pixel width was 0.038". The grid in the ve- 
locity dimension has 107 points in the range [-800km s _1 , 
800km s _1 ] or a velocity sampling of 15.1km s _1 . This 
is smaller than the velocity scale of the STIS spectro- 
graph (~ 19km s _1 per pixel at ~ 8500 A or a wave- 
length scale of 0.56 A per pixel). In general it was 
found necessary to use a velocity range which is at least 
±4— 6 x <7 max , where <7 max is the largest observed velocity 
dispersion. 

It is standard practice to generate orbit libraries for a 
single value of the mass-to-light ratio To and to generate 
libraries for all other T values by scaling the velocities 
by a factor ^T/T (e.g. vdM98; Cretton et al. 1999). 
We will refer to the library generated using M/L = To 
as the "primary library" for each value of M m . It is im- 
portant that the choice of To be determined by a prior 
estimate of the best- fit value of M/L (based on e.g. 21 or 
spherical models). If the velocities in the primary library 
are stored on a grid with range [— vo,vq] and grid spac- 
ing Svq, the scaled velocities for any other T will have 
a range [— ^TJTqVo, ^/T/To«o] and velocity spacing of 

s/T /TqSvq. The value of vq must be set by the smallest 
Tfrnin for which the model will subsequently be scaled: 
-v/Ymin/Yo^o — 4cr max , and the value of Svq should be 
set by the largest T max to which the model will be scaled: 
\/Y max /To<5vo — Av bs where A« b s is the velocity sam- 
pling of the highest-resolution spectrographic data set. 
Carelessness in this regard can lead to spuriously poor 
fits to data at low and/or high values of T. 

Since we store the orbit at equal time intervals, each 
time the ath orbit passes through a cell centered on 
(x',z',v' y ) it adds a constant fractional weight 6 — 
l/(A r pcr iod x A stcp ) in that cell. At the end of the integra- 
tion we store the total weight ui", z , v , contributed by this 

orbit to each cell. In practice it was found to be better 
to construct the orbital LOSVDs using a kernel density 
estimator (with a kernel width of 2.5x8vq ~ 38km s _1 ) 
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rather than by simple binning in v' since this results in 
smoother LOSVDs without compromising velocity reso- 
lution of the orbital LOSVDs. This practice significantly 
improves the accuracy and speed of fitting the observed 
LOSVDs. 

A final grid in the (r, 9)- plane is used to store 3-D 
kinematical properties of the orbits. We store the den- 
sity weighted (un-centered) first and second moments of 
the LOSVDs in spherical polar coordinates: ~pup, J>v$, ~pvg 

and pv%, pv^, pvg. These 6 quantities as well as p the 
average density (in the cell) are computed and stored in 
each of the 20 radial and 16 polar cells described above. 
These quantities are not used in fitting the data but are 
useful for analyzing the properties of the resulting mod- 
els. 

PSF- Convolution 

Convolution with the point spread function (PSF) is 
essential when comparing the orbit libraries with the ob- 
servations. The choice of Cartesian grids in [x' , z', v' y ) for 
storing the kinematical data is driven by the fact that 
PSF convolution is most easily carried out in Fourier 
space via standard Fast Fourier transforms (FFTs). 

For this paper we assume that all PSFs are circularly 
symmetric Gaussian (or multiple Gaussian) with FWHM 
given by the observed seeing. Bower et al. (2001) have 
shown that the STIS/CCD PSF at - 8500 A has a 
FWHM = 0.079" with a broad asymmetric wing on one 
side. This ring represents the first Airy ring in the PSF 
and probably arises from misaligned optical elements. 
Bower et al. also carried out tests with synthetic spectra 
to show that a symmetric model PSF obtained by folding 
and averaging the true PSF about the center reproduces 
the observed data to within the errors. They found that 
even when noise was not added to the spectrum, the 
kinematic measurements from the model PSF and the 
observed PSF were not statistically different. We there- 
fore use a PSF which is a circular Gaussian with FWHM 
of 0.1" for both the PSF convolution with the orbit li- 
brary, as well as for generation of the simulated-data. 

PSF convolution with a seeing function correlates the 
data in the two spatial directions but does not affect data 
in the velocity direction. Therefore PSF convolution is 
carried out separately for each 2-D velocity slice of each 
of the (x', z', v'y) grids. PSF convolution redistributes the 
orbital weights and we now represent the weight due to 
the ath orbit in the bin centered on (x' , z',v' y ) by Cj", z , v , . 

In order to properly scale the orbital LOSVDs observed 
though different apertures, it is essential to know the to- 
tal flux observed through each aperture. In general this 
information is not available from the kinematical data. 
We therefore compute this from the model density dis- 
tribution on a Cartesian grid with the same spatial res- 
olution as each of the kinematic storage grids. These 
projected mass grids are also convolved with the appro- 
priate PSFs. The resultant projected mass in each grid 
cell is represented by s°kj, . 

(PSF convolution was carried out using a FFT routine 
originally written by Norman and Brenner of MIT Lin- 
coln Labs in 1968 and modified for the current problem 
and kindly made available by R. van der Marel.) 



"Observing" the Orbit Library 

After each velocity slice of the Cartesian storage grid 
and the Cartesian projected mass grid is convolved with 
the PSF, the kinematic properties of each orbit (and its 
projected mass) are "observed" through the same set of 
apertures as the data. Following Rix et al. (1997) and 
Cretton et al. (1999) we use a simple scheme to com- 
pute the contribution of each pixel of a storage bin to 
each aperture. Each pixel contributes a fraction t x i z h 
to the Ith aperture, where < t x > z h < 1 depending on 
whether the pixel centered on (x', z') lies entirely outside 
the aperture, on the edge of the aperture, or entirely in- 
side the aperture. Since the positions and orientations 
of the apertures relative to the Cartesian grids is fixed 
for all the individual orbits these t x > z >i are computed at 
the start of the orbit library program and stored. The 
un-normalized LOSVD of the ath orbit as seen through 
the Ith aperture is then obtained simply by 

ATf (<) = $>^ x • T *'*'<- ( 5 ) 

x' : z' 

The total orbital mass contribution to the Ith aperture 
is 

m i = Yl Q x'z'v' y ■ T x'z'l- (6) 

x',z',v' y 

Finally, as noted earlier, the observed flux through each 
aperture is information that is not generally available 
from the data but is required for proper scaling of the 
LOSVDs. We therefore compute the "observed" mass in 
each aperture M° hs from the theoretical surface density 
profile of the model via 

M? hs = J2K'!'-T X '*n. (7) 

x' ,z' 

2.3. Constructing the Model 

The construction of a 31 model to fit the constraints 
now consists of finding a weighted superposition of the or- 
bits that best reproduces both the assumed model stellar 
density distribution p(zu, z) and the observed LOSVDs, 
or some representation of the LOSVD. If there are N c 
is total number of observational constraints (mass and 
velocity), and N Q is the number of orbits, we minimize 
the mean square deviation in the quantity % 2 , where 

m=JVc / a=N a \ 2 

x 2 = ^ E An - E rB«) , (8) 

m— 1 \ ot—1 / 

subject to a basic set of non-negativity constraints: 

7 a > 0. (9) 

In the set of equations above 7" is the weight assigned 
to the ath orbit, D m are the N c observational constraints 
and B!^ is the contribution of the ath orbit to the mth 
constraint. The matrix elements D m and are re- 
placed by the various observable quantities described in 
§ l2.2.0l as well as other quantities that are required to con- 
struct the self-consistent model, such as the mass M°g s 
in each cell. This is not an observed quantity but is 
derived from p(zu, z). The corresponding orbital masses 
m" e that are superposed are weighted by 7" such that, 

M° e b8 = E^<^ ( 10 ) 

a 
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In principle one can attempt to fit all the observed 
data as well as the mass (self-consistency) constraints 
to within numerical precision. In practice, the observed 
LOSVDs (and quantities derived thereof) have finite er- 
rors and there is nothing to be gained by attempting such 
precision in the model fits. Following standard proce- 
dure, we account for the errors in different quantities by 
dividing both the observed data and the corresponding 
quantity in the orbit library by the error on the observed 
data. 

Since the self-consistency (mass constraints) in eq. ijTol 
are derived and not observed quantities, there are no 
"observed errors" on them. It is therefore possible to ar- 
bitrarily set the relative weighting of the kinematic con- 
straints and mass constraints (which have essentially in- 
finite accuracy). Instead of an error we use a constant 
scaling factor (l/SM) which sets the weight of the mass 
constraints relative to the kinematical constraints. For 
each data set one needs to experiment to determine the 
scaling factor that gives a consistently good fit to the 
mass constraints for all input parameters while satisfy- 
ing the kinematic constraints. (Note that unlike Rix ct 
al. (1997) we do not explicitly include aperture mass 
constraints in the objective function because here too 
the errors in the aperture masses are unknown. If wc 
were to include them, this would introduce yet another 
free scaling factor. Also, unlike Rix et al. we do not sep- 
arately fit the surface density distribution, since this is 
automatically guaranteed by an accurate fit to the mass 
distribution. We have found that it is generally possible 
to fit the meridional plane masses to a fractional accu- 
racy of ~ 10 -2 — 10~ 5 over the entire M, — T plane and 
this always guarantees a fit to the projected mass (or 
equivalently surface brightness distribution) with error 
of less than 1%.) 

The second set of constraints to be fitted are the 
kinematic constraints, consisting of the LOSVDs ob- 
served through each aperture. The un-normalized orbital 
LOSVDs given in equation (j^J can be linearly superposed 
to obtain a fit to the observed LOSVDs: 



(ii) 



Since LOSVDs are often approximately Gaussian in 
shape, it is common practice to represent the observed 
LOSVDs through a truncated Gauss-Hermite series. The 
highest quality spectra can yield useful Gauss-Hermite 
moments up to order 6; fitting of moments up to order 
4 is now standar d. We follow the method suggested by 
ijRix et al. 19971) to linearly superpose mass-weighted or- 
bital GH moments that are linear in the orbital LOSVDs 
and refer the reader to this source for details. The ob- 
served kinematic data do not include information on the 
lowest order moment of the LOSVDs (ho or 70), or the 
total flux through each aperture (M° hs ) 

Previous authors ha ve fitted eithe r the GH 
moments ( e.g. llRix et al. 19971): vdM98 ; 



i Cretton et al. 19991): llCretton fc van den Bosch 1999J) : 
( ^ 

( Verolme et al. 2002D1 _ or the enti re LOSVD 
( Gebhardt et al. 2000allBower et al. 2001(1 . In principle 
it is possible to fit a combination of both kinds of 



constraint. It is generally observed that LOSVDs are 
likely to deviate strongly from a Gaussian (due to 
high-velocity wings) only in a few apertures close to 
the center. For these apertures it may be important 
to fit the full LOSVD. If the LOSVDs are explicitly 
fitted in the central apertures labeled by I, 1 < I < h, 
and the lowest few GH moments are fitted elsewhere, 
li + 1 < / < i max , then the problem of fitting the data 
via a linear superposition of the orbits can be viewed 
as a problem of minimizing an objective function of the 
form 



r() 



M 



obs 



E 



5M 

Nr hs (v'y)-E a rN?(v' y ) 



1=1 L 

! m « /i max 

E E 



A(N^(v' y )) 

M? hs ht s -E a m% 



A(M° hs h°^ 



(12) 



The mass-weighted Gauss Hermite moments H arc 
given by 

Nnv' y )g(w)H t (w)dv, (13) 

-OO 



i = l,h n 
S(l/) = (27T) 



-l/2„-!/ 2 /2 



10= (u - Vl)/(Tl. 

Typical values of /i max are 4 or 6. We are free to multi- 
ply each pair of terms inside the same square brackets in 
the objective function by a constant factor, e.g. a scal- 
ing factor or an inverse error. In equation 1121 we have 
multiplied each term by an inverse error for illustration. 
This gives equal weight to each of the different terms in 
equation IT21 

Minimization of the objective function was car- 
ried out using two different software packages: the 
quadratic programming algorithm E04NCF of the NAG 
libra ries, and a non-negative least squares (NNLS) rou- 
tine l|Lawson fc Hanson 1995(1 . The two algorithms gave 
similar results; for all models described below we present 
the fits obtained using the NAG routine. 

Unless otherwise noted, we use the symbol x 2 to rep- 
resent the objective function including all quantities in- 
cluded in the fit and not just e.g. the kinematical con- 
straints. Since the objective function includes errors in 
the measured quantities, x 2 as we define it should be 
loosely interpreted as a reduced x 2 , although as we dis- 
cuss below, that interpretation is problematic. 

2.4. Regularization 

One disadvantage of an orbit-based approach to model 
building is that the solutions are extremely unsmooth. 
One source of this lack of smoothness is the discrete way 
in which phase space is sampled. But even more impor- 
tant is th e inherent ill-co nditioning of the self-consistency 
problem (Merritt 1993bJ. A single orbit, which repre- 
sents a delta-function in integral space, covers a finite 
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region in configuration space. Deriving the integral-space 
density from the configuration space density is there- 
fore a deconvolution problem, and deconvolution has the 
property of amplifying errors or incompleteness in the 
data. Even a highly noisy set of orbital weights can gen- 
erate a smooth configuration-space density, and there are 
many more noisy solutions than smooth ones. This ef- 
fect actually becomes worse as the number of orbits is in- 
creased since a fine grid is better able th an a coarse gri d 
to represent high-frequency fluctuations (PhilliDS 1962). 

Lack of smoothness is an inconvenience when plotting 
deprojected quantities, and for this reason it has be- 
come standard practice to couple Schwarzschild's tech- 
nique with some sort of "regularization" scheme to en- 
force smoothness (e.g. Richstone & Tremaine 1988; Cret- 
ton et al. 1999; Gebhardt et al. 2003). But a deeper 
worry is that the ill-conditioning might lead the opti- 
mization algorithm toward a non-smooth solution that 
has no smooth counterpart. If imposing smoothness on a 
numerical solution causes it to depart strongly from self- 
consistency, one would conclude that no solution contin- 
uous in the phase-space variables exists, and that the ap- 
parent self-consistency is a numerical artifact associated 
with the discretization. Merritt & Fridman (1996) first 
investigated this question in the context of Schwarzschild 
modeling of triaxial galaxies; they found that their non- 
smooth solutions had the same, average properties as so- 
lutions for which smoothness was imposed. On the other 
hand, Jalali & de Zeeuw (2002) found in modeling scale- 
free disks that spurious solutions could be generated by 
using a number of orbits that was large compared to the 
number of mass constraints. 

In the context of potential estimation, we need to check 
that the indeterminacy in quantities like M m is not an ar- 
tifact of noise in the solutions. For instance, it is possi- 
ble that solutions with the "wrong" M, are much noisier 
than solutions with the "true" M,, or that the range 
of indeterminacy is strongly dependent on the level of 
smoothing. 

A standard way to regularize is by adding a penalty 
term to the objective function (JHJ): 

m=N c / a=N \ 2 a=N 

x' 2 = ^E An- £7^ +a£p (7 «) 

m— 1 \ a— 1 / a— 1 

(14) 

where P(j a ) is define d to be large and positive when the 
solution is unsmooth ijPhillips 1962t ITikhonov 19631) . A 
number of choices are possible for P(^y a ), depending on 
the definition of "smoothness." Here we follow Merritt 
& Fridman (1996) by adopting "zeroth-order" regular- 
ization: 

P{l a ) = (7 Q ) 2 (15) 
(e.g. Miller 1974) which has the effect of filtering fluctu- 
ations on scales shorter than some maximum value de- 
termined by the smoothing parameter A. Models with 
A = have no regularization and models with A —* oo 
are characterized by uniform orbital weights. 

Having obtained a solution by minimization of equa- 
tion (|14fl . one would like to measure the degree of 
smoothness. The simplest way would be via P(j a ), with 
7" the orbital weights corresponding to the smoothed 
solution. 



Alternatively one can attempt to measure the degree 
of smoothness in phase space of the function j(E, L z , (3) 
the orbital weights on the grid of orbital initial con- 
ditions described in § 12.2.01 Following Cretton et al. 
(1999) we compute the second-divided difference (in 
place of second derivative) of the dimensionless function 
j(E, L z , /3)/7o(-E). 7o(£') the "reference weights" and 
are a rough approximation to the energy dependence 
of the model. Following Rix et al. (1997) we employ 
the simplest possible regularization by setting all the 
7o(-E) = 1 and characterize the smoothness via the noise 
parameter: 

-. N R 

n = — T (16) 

I— 1 

( d 2 7 (E,L Zl P) d 2 7 (£,£ z ,/?) d 2 7 (£,£ z ,/?) A 

V dE 2 dh\ dp 2 ) i 

where d 2 j(E, L z , (3)/dE 2 etc. represent the second di- 
vided differences of the weights of adjacent orbits in the 
space (E,L Z ,{3), and Nr is the number of interior grid 
points for which a second divided difference can be com- 
puted (e.g. Cretton et al. 1999 ). 

We have used both the quantities P(j a ) and n to 
quantify the degree of noise and find little difference in 
the results. Since the quantity n has been used in other 
studies and is more physically meaningful we use it to 
represent the degree of smoothness of our models in the 
discussion in § 7. 

It is interesting to note that for any smoothed model 
the contributions from different parts of phase space to 
the total noise (n in eq. I17[) depend primarily on energy 
E remaining roughly constant at all values of (L z , (3) at 
a given energy. The noise in phase space is smallest at 
small energies and increases slowly with radius (energy) 
reaching a maximum at ~ the 35th energy level dropping 
slowly thereafter. 



3. INDETERMINACY OF THE THREE-INTEGRAL 
PROBLEM 

Before discussing the results obtained by applying our 
31 modeling algorithm to simulated data, we review the 
reasons why we expect the potential estimation problem 
to be under-determined in the axisymmetric geometry, 
given the sorts of data (kinematical quantities measured 
along multiple long slits) that we are dealing with here. 

Consider first the spherically symmetric case. De- 
projection of S(i?) yields j(r), the luminosity density, 
uniquely; given values for (T, M,), the mass density p{r) 
and potential <I>(r) are also known. Suppose that the stel- 
lar distribution function / is isotropic, / = f(E). Then 
Eddington's formula gives the unique / that reproduces 
j(r) in this $(r), and corresponding to this unique / is 
a particular RMS velocity profile cr 2 (r) = J f(E)v 2 dv. 
Changing $ will change both / and a in well-defined 
ways, so that the goodness-of-fit of er(r) to the ob- 
served RMS velocities will vary continuously with the 
parameters (T,M.) that define the potential. There- 
fore, there will generally exist a best-fit (minimum \ 2 ) 
set of parameters for any kinematic al data set. This has 
been illustrated in numerous studies (|The fc White 1 986: 
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Little fc Tremaine 1987t iKulessa fe Lvnden-Bell 19921: 
Merritt fc Tremhlav 19931). 

Suppose next that the stellar distribution function has 
the more general form / = f(E, L 2 ) with L the angular 
momentum per unit mass. There are now many func- 
tions f(E, L 2 ) that can reproduce a given j(r) in a spec- 
ified since j(r) is a projection over velocities of 
f(E,L 2 ) and different 2D /'s can have exactly the same 
ID projection. The same is true if additional moments 
of the distribution function (e.g. <r(r)) are added as con- 
straints: many 2D functions / are still able to repro- 
duce a finite set of such ID constraint functions. This 
means that one has the freedom to vary / along with 
$ in order to maintain the goodness-of-fit to the data, 
subject only to the constraint that / be non-negative. 
The result is an indeterminacy in the parameters that 
define the potential: in general, there will be a range 
of potentials for which / can be adjusted such that the 
fit to the data is equally good, and no "best-fit" po- 
tential can be found. The indeterminacy of potential 
estimation in the spherical geometry has been exten - 
sively demonst rate d (e.s. llDeionghe fc Merritt 1 9921: 
(|Merritt 1993aj) .b: l|Merritt k Saha 19 93)). These stud- 
ies document that the range of allowed potentials - i.e. 
potentials consistent with a non-negative f(E,L 2 ) given 
a finite set of data constraints like and <j obs (R) - 
can be extremely wide. 

Consider next the axisymmetric case. Inversion of 
Y) can give j(r, 9) uniquely if the galaxy is known 
to be edge-on; otherwise there is an ind eterminacy in 
j (IRvbicki 1987t IGerhard fc Binnev 19961) . We ignore 
that indeterminacy here and assume that j{r,0) is pre- 
cisely known. Suppose first that / is restricted to 
its simplest possible form consistent with axisymmetry, 
/ = f(E,L z ). Just as in the spherical isotropic case, 
there is a unique, 21 f that reproduces a given j(r,&) 
in a specified <l>(r, 9) l)Lvn den-Bcll 1962; iHunter 19751 
Dcionghc 1986). Furthermore this unique / is associ- 
ated with unique values for the mean square velocity at 
every point in the projected image. Varying $ will force 
both / and its associated velocity field to vary, hence 
once expects to find a single set of values (T, M,) that 
provide the best fit to the measured velocities. This has 
been verified in a num ber of 21 modeling studies (e.g. 
Binney et al. 1990; (pehnen 19951: iQian et al. 19951: 
iMagorrian et al. 19981) V 

In the general axisymmetric case, / is a function of 
three variables, / = f(E, L z , I 3 ) (assuming as above that 
all orbits are characterized by three isolating integrals). 
Just as in the anisotropic spherical case, there are now 
many functions f(E, L z , I3) that can reproduce a known 
j(r,9) in a specified <&(r, 9), since many 3D functions / 
project to the same 2D function j. The same will be 
true if to j are added a finite set of 2D data constraints, 
such as the mean square velocity measured over the im- 
age of the galaxy. The argument that was made above in 
the anisotropic spherical case then applies to the 31 ax- 
isymmetric case: changes in the assumed form of $(r, 9) 
can generally be compensated for by changes in / so as 
to leave the fit to any finite set of 2D data constraints 
precisely unchanged, and one expects to find a range of 
potentials over which the goodness-of-fit to the data is 
constant. The extent of this constant-x 2 region is deter- 



mined by the requirement that / > 0; if the potential is 
made sufficiently extreme, the only /'s that can repro- 
duce the data will be negative somewhere in phase space, 
and the fits of non-negative f's to the data will begin to 
suffer. 

In the anisotropic spherical case, it is generally be- 
lieved that measuring the LOSVDs at a large enough 
set of radial positions can uniquely constrain the po- 
tential. Numerical experimen ts seem to bear this out 
ijMerritt 1993at IGerhard 1993]) although only a small set 
of cases have been tested and no general theorems have 
been proved. Similarly in the 31 axis ymmetric case, it 
is hoped (e.g. ijCappellari et al. 2003Q)) that sufficiently 
good, 2D data will uniquely constrain both <£>(r, 9) and 
f{E,L z ,I§). This is at the present time only a hypoth- 
esis, and given the non-linear relation between the data 
and the potential, we expect that a given data set will 
either under-, or over-constrain the potential; a precise 
match between the information content of the data and 
potential seems difficult to achieve. 

We stress that the indeterminacy discussed here is 
mathematical, not statistical, in nature, and is not 
due simply to the fact that operations like deprojection 
are ill-defined when data are noisy or incomplete (al- 
though th ose factors may contribute to the indetermi- 
nacy e.g. <)Cretton fc Emsellem 20 03)). This means that 
any statistic like \ 2 that measures the mean deviation be- 
tween the data and the model will generally be precisely 
constant over finite regions of parameter space - regions 
in which the data functions predicted by the model are 
unchanged as the model parameters are varied. We sug- 
gest that a sensitive test of the quality of a 31 modeling 
algorithm is its ability to reproduce such perfectly-flat \ 2 
plateaus, since any limitations in the flexibility of the al- 
gorithm will keep it from reproducing some /'s as well as 
others, resulting in spurious minima in x 2 ■ For instance, 
if a 31 algorithm were written in such a way that it could 
only reproduce the subset of /'s satisfying / = f(E, L z ), 
one would always find a unique minimum in x 2 {^ 1 M,). 

4. A TEST CASE: A 21 MODEL OF M32 

We would like to test our algorithm against a reason- 
ably realistic, axisymmetric galaxy model whose proper- 
ties are precisely known. For this purpose we constructed 
an axisymmetric two-integral (21) model, / = f(E,L z ), 
with properties very similar to those of models that have 
been fitted in the past to data from M32. In this section 
we describe the construction of that model and the way 
in which we generated simulated "data sets" from it. 

We constructed 21 models using the Hunter & Qian 
(1993) (HQ) prescription to derive the even part of the 
distribution function from a given mass model. The 
mass model was represented by a sum of 3D Gaussian 
functions using the Multi Gaussian Expansion (MGE) 
method (Monnet et al. 1992; Emsellem et al. 1994). 
This method allows one to obtain a simple analytic form 
for the potential; the HQ derivation is also simplified 
due to the fact that the exponential form (Gaussians) 
separates well in the complex plane. Thus an analytical 
continuation of the potential known only on the real axis 
is straightforward. (It is important to note that while 
the MGE method described below is used to generate 
the density profile for the pseudo-data and the orbit li- 
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brary is constructed using the analytic density profile in 
eq.0J both density profiles agree extremely well with each 
other.) 

The 21 models were designed to give a good fit to 
all space-based and ground-based observations of M32 
available up to the year 2001. These data include 
long slit spectra along four position angles, and one 
slit offset from the major ax is, obtained with the WHT 
l|van der Marel et al. 1994atk CFHT spectra (Bender et 
al. 1996); HST/FOS spectra at eight apertures close to 
the major axis (van der Marel et al. 1997); and the 
HST/STIS spectra of Joseph et al. (2001). 

A fit to the surface brightness distribution was ob- 
tained by applying the MGE method to both a wide 
field and a high resolution I-band image. The wide 
field image, kindly provided by R. Michard and taken at 
INT/PFCU, contained 382 x 575 pixels (0.549 "/pixel); 
the resolution was modest, > 2 "FWHM. The MGE fit 
was first done directly on the wide field image to con- 
strain the large scale luminosity distribution, after mask- 
ing any point sources (e.g. stars). The fit was found to be 
good down to 19.5 mag arcsec~ 2 with the sky becoming a 
problem at fainter levels. The broadest Gaussian had a a 
of about 45": this means that at a radius of 100", the lu- 
minosity of the model drops very rapidly (exponentially) . 
Previous tests have shown that this should not influence 
the central kinematics (Emsellem et al. 1999). The low- 
frequency components (Gaussians with cts larger than 8") 
of the original fit were then removed from the high res- 
olution image (in the case of M32 the WFPC2/F814W 
image was used after proper normalization). A fit was 
then performed against the residuals using a 4-Gaussian 
approximation for the WFPC2 PSF in the F814W filter. 
The resultant fit provides the deconvolved model for the 
surface brightness at the very center (for more details 
see Emsellem et al. 1999). The final model for M32 con- 
sisted of 11, 2D Gaussian components. Since even the 
HST WFPC observations have a finite spatial resolution 
which causes a spurious turnover in the central density, 
the central luminosity profile was replaced by a power- 
law component, or cusp. This cusp was prescribed as in 
Emsellem et al. (1999), with a power law slope of 1.5 
(j*(r) cx r -1 - 5 ) and a Gaussian width of 0.05". 

The total energy of the model was kept constant 
when the cusp was added and this additional compo- 
nent did not change the fit of the surface brightness 
distribution in the central parts. The even part of /, 
/+ = h[f(E,L z ) + f(E,—L z )), was then derived for an 
assumed angle of inclination i, mass-to-light ratio T and 
black hole mass M,. The simulated data sets described 
below were derived from a model with i = 90° (edge-on) , 
T v = 2 and M. = 2.625 x 10 6 M Q . The odd part of / 
was chosen following the prescription of Emsellem et al. 
(1999), by flipping the direction of orbits with respect 
to the symmetry axis until the best fit was obtained to 
the observed kinematics. The projected LOSVDs were 
then computed on a very fine grid (1600 logarithmically 
spaced points within the one quadrant of the central 
15"). Finally, the LOSVDs were convolved to take into 
account the seeing and the instrumental PSFs and aver- 
aged over the apertures (pixel sizes) appropriate to each 
set of simulated observations. We assume a distance to 
M32 of 0.7 Mpc, as in earlier studies (e.g. vdM98). 
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Fig. 1. — All mass and kinematical constraints for simulated 
data set A. (a) The model density in a total of 266 cells at 16 radial 
intervals and 14 polar angles (9 in degrees). The density is plotted 
in arbitrary units (density profile for each polar angle is offset from 
the previous angle by 1 unit). Error bars used in the actual fits are 
plotted for 8 = 14 but are multiplied by a factor of 10 for visibility, 
(b) The projected (theoretical) mass in apertures which is used to 
scale the GH moments; (c)-(f) Vi,<ji,h3,ti4 with error bars used in 
the model fits. 



Two simulated data sets were constructed from this 21 
model: 

Data set A was designed to simulate kinematical data 
obtained by STIS on HST. The 21 model was "observed" 
at STIS resolution (0.1") in 0.05"x0.1" apertures from 
-1.5" to 1.5" along the major axis and the HST PSF 
was applied. The LOSVDs were extracted in each aper- 
ture and sampled at 5 km s _1 intervals. These LOSVDs 
were then used to compute the projected velocity V 
and velocity dispersion a as well as the first six GH 
moments at each aperture position. In addition, the 
LOSVDs were resampled at two other velocity spacings: 
40km s _1 (comparable to that velocity resolution of the 
STIS spectrograph ~ 38 km s _1 ) and at 100 km s _1 , 
corresponding approximately to the velocity resolution 
of the FOS spectrograph (used to observe M32 by van 
der Marel et al. [1997] and to observe NGC 3379 by 
Gebhardt et al. 2000a). 

Data set B was obtained by "observing" the 21 model 
with the same set of apertures and PSFs as in the data 
compiled by vdM98 and used by those authors in the con- 
struction of 31 models for M32. These data, consisting 
of combined data sets from the WHT, CFHT and FOS, 
are the same data used in constructing our 21 model. 

Since these are simulated data, there are no errors and 
no scatter in the data points. There are two ways in 
which "pseudo-errors" may be assigned to data points. 
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First all velocities and velocity dispersions, and GH mo- 
ments can be assumed to have a fixed error (e.g. we 
choose an error of 10 km s _1 in V and c, and /13 and /14 
were assumed to have errors of 0.1). Such error values 
are fairly typical of those associated with real HST/STIS 
data and CFHT data but somewhat larger than the er- 
rors associated with the WHT data. Alternatively the 
pseudo-data can be assumed to have the same errors at 
each point as the real data. 

In addition to error, real data have scatter. In the 
interest of keeping the number of free parameters to a 
minimum the pseudo data sets A and B do not have any 
scatter. This could affect the solution space by allow- 
ing models that are systematically different but not too 
far off to give equally good fits to the data, where one 
might have been harder to fit had there been appreciable 
scatter. 

In order to introduce scatter into the pseudo data in a 
meaningful way we would need to run models for a vari- 
ety of different levels of scatter to determine how scatter 
affects the results. Such a study is beyond the scope of 
this paper. However in order to ensure that the results 
are not purely an artifact of the "pseudo" nature of the 
data, in addition to these simulated data sets, we also 
applied our modeling algorithm to the actual kinemati- 
cal data in vdM98. We refer to these data as data set C. 
Of course, data set C can not serve as a test of our algo- 
rithm since we do not know the true "model parameters" 
of M32! However these data do allow us to compare our 
results with those of vdM98, and to test the sensitivity of 
the derived parameters for M32 on the number of orbits 
in the library, etc. 

In what follows, unless stated otherwise, black hole 
masses are expressed in units of 10 6 A4q and mass-to- 
light ratios in solar units in the y-band. 

5. FITS TO DATA SET A - CONSTRAINING M. FROM 
NUCLEAR DATA 

We first apply our modeling algorithm to various sub- 
sets of data set A. Data set A consists of kinematics 
within 1.5", "observed" in such a way as to mimic obser- 
vations of galactic nuclei with HST/STIS. In addition we 
include mass constraints out to 100". Figure^shows the 
entire data set; the total number of constraints is 571. 
No regularization (smoothing) constraints were imposed 
in any of the models in this section. 

In order to test the dependence of the modeling results 
on the number and type of data points supplied to it, we 
defined restricted data sets as follows: 

a) A total of 98 constraints, consisting of the masses 
in 56 cells (every third radial cell and every third polar 
angle), and vi and o\ as measured in every third aperture. 

b) A total of 164 constraints, consisting of the masses 
in 102 cells (every other radial cell and every other polar 
angle), and vi and 07 in 31 apertures. 

c) A total of 226 constraints, consisting of the same 
mass constraints as in (b), as well as vi, ai, and the GH 
moments /13 — /14 measured at the same positions as in 
(b). 

d) All 571 constraints, consisting of 19 x 14 cell masses, 
and vi, ai and h — /14 in all 61 apertures. 



= 15.5 
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Fig. 2. — Contour plots of x 2 (M.,T) for models constructed 
to fit various subsets of data set A. The star indicates the true 
model parameters. Left column: The number of orbits used in 
the solutions was fixed at N = 1430. (a) Fits to vi and 07 only, 
coarsely sampled; N c = 98. (b) v\ and 07 only, finely sampled; 
N c = 163. (c) All four GH moments, finely sampled; N c = 225. 

(d) All four GH moments, very finely sampled; N c = 571. (N c 
includes mass constraints.) Right column: Fits were carried out 
using the same data as in the left column, but now the number of 
orbits has been varied in order to keep N /N c constant at 15.6. 

(e) N = 2451 (f) N = 3412 (g) N a = 8928. When the ratio 
of orbits to constraints is kept constant, increasing the number of 
data points has little effect on the tightness of the x 2 contours. 



We did not explicitly include the aperture masses shown 
in Figure^ (M° ) in the fits (although they are implic- 
itly included as described in § 12.31 ) However we verified 
that the aperture masses were always fitted to better 
than 0.1% for this data set. 

The left column of Figure |21 shows how the \ 2 contours 
change as the number of constraints is increased, given 
a fixed number of orbits, N a — 1430. It is clear that 
the lowest velocity moments U; and 07 contain almost no 
information about M, or Ty: only when the higher GH 
moments are added do the \ 2 contours begin to exhibit 
a definite minimum. However the best-fit parameters 
in Figure [2J1 are substantially displaced from their true 
values and plots of the predicted kinematics confirm that 
the fit to the data is poor. 

A possible explanation for the offset and for the poor fit 
when the number of data constraints is large, is the small 
ratio of orbits to constraints in FigureElI, N a /N c = 2.5. 
This modest ratio - while typical of the published mod- 
eling studies (e.g. vdM98) - suggests that our algorithm 
did not have much freedom to explore different orbital 
solutions. To test this idea, we repeated the experiments 
but this time increased the number of orbits in step with 
the number of constraints so as to keep the ratio N /N c 
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Fig. 3. — Contour plots for a fixed set of observational con- 
straints (same as in Figure 2d, N c = 571) but different numbers 
of orbits, as indicated. The conclusions drawn from this data set 
about the best-fit model parameters M. and Ty and their uncer- 
tainties would depend very strongly on the number of orbits used 
in the modeling. The properties of the models labelled A- D arc 
illustrated in Figure IB1 



fixed. The results are shown in the right panel of Fig- 
ure [21 The differences are striking: we now see that the 
topology of the first set of contours was an artifact of 
the small number of orbits used. When the number of 
orbits is increased from 1430 to 8928 - i.e. when the ra- 
tio of orbits to constraints is increased from 2.5 to 15 - 
the minimum in \ 2 disappears, leaving only a broad % 2 
plateau. The true set of model parameters lies within this 
plateau although there is no sense in which this model 
can be said to be "preferred." Evidently, even the full set 
of GH moments can only weakly constrain the potential 
when the modeling algorithm has the freedom to con- 
struct a wide variety of orbital populations. It must be 
emphasized that in the absence of smoothing constraints 
the actual number of orbits actually used by the opti- 
mization routine is roughly equal to the total number of 
constraints, irrespective of the size of the orbit library. 
Increasing the size of the orbit library basically increases 
the availability of orbits with the right kind of properties 
in the right part of phase space. 

In these experiments, the number of observational con- 
straints was varied. More typically one is faced with a 
fixed number of measurements. Figure OH shows what 
happens when N c is fixed - we used the full data set 
A, with N c = 571 - but the number of orbits is varied. 
Again we see that the topology of the x 2 plot depends 
strongly on the ratio of orbits to constraints. As N a /N c 
increases from 2.5 to 5.0, the \ 2 contours shift so that 
their apparent center is close to the true model parame- 
ters, but as N a /N c is increased still more, all semblance 
of a unique x 2 minimum vanishes and the potential pa- 
rameters become essentially unconstrained. Indeed it is 
not clear from these plots whether we have reached a 
limit; the \ 2 valley may become even broader as N Q /N C 
is increased above 15.6. In the plots with the two largest 
values of N a /N c , models lying within the \ 2 plateau pro- 
vide essentially perfect fits to the kinematical data and 
each of the mass constraints is fit to better than one part 
in 10 6 . Figure 2] shows the quality of the fit to the data 
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Fig. 4. — Fits to the kinematical data (Fig. 1) for two orbital 
solutions that lie within the \ 2 valleys of Fig. 3, close to the true 
model (*). Solid line: N a /N c = 5; dashed line: N a /N c = 2.5. 
Models constructed using the two larger values of N /N c shown in 
Fig. 3 provide almost perfect fits to these data; those fits are not 
shown here. 
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Fig. 5. — 1-D cuts through the \ 2 plots of Fig. [3] all taken at 
Ty = 2. The vertical arrow indicates the location of the true model 
parameter, M. = 2.625 X 10 6 .Mq. When the number of orbits used 
is small, there is a definite, but spurious, \ 2 minimum. As N is 
increased, this minimum broadens into the perfectly flat plateau 
characteristic of under-determined problems. The true model pa- 
rameters lie on that plateau but can not be unambiguously recov- 
ered. 



in the cases N D /N C = 5.0 and 2.5; the most significant 
deviations are in /14. 

Figure [3] shows ID cuts through the \ 2 plots of Fig- 
ure|31 all taken at Ty = 2. As the ratio N Q /N C increases, 
two things happen: the absolute value of \ 2 drops, re- 
flecting the better quality of the fit as the number of 
orbits is increased; and the x 2 - vane y becomes broader. 
The plateau of precisely-constant x 2 predicted in §3 is 
very clear for N /N c > 5. The true value of M, lies 
within this plateau but there is no sense in which it is 
preferred. This behavior of the \ 2 plots as N Q is varied 
was first predicted by Merritt & Ferrarese (2001) (their 
Fig. 7). 

The internal velocity dispersions in four models (la- 
beled A-D in Figure are shown in Figure The 
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Fig. 6. — The intrinsic velocity dispersions oy, a^, ug as functions 
of radius for models A-D in Fig. [3j All models have comparable 
X 2 values and Ty = 2. Black hole masses are: A, 1 X 10 6 Mq; 
B, 2.66 x 10 6 M e ; C, 6. x 10 6 A4 Q ; D, 8.5 x 10 6 M Q . The values 
of T and Mm used in constructing Model B are closest to the true 
values. This model is approximately isotropic (a> fa erg), as was 
the 21 model from which the data were generated. 
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Fig. 7. — x 2 contours for fits to the data from data set A, to which 
has been added the simulated data from the "WHT" ground-based 
apertures. 



models all have % 2 values comparable to the model with 
the true potential parameters. Close to the center, the 
model with lowest M. (A) has a significantly larger num- 
ber of stars on radial orbits than the models with large 
M. (C-D); the increase in oy is needed to keep the central 
velocities high in spite of a too-small black hole. Never- 
theless, so great is the freedom to choose different or- 
bital populations that even knowledge of the projected 
GH moments can not rule out these extreme models. 

It is essential to point out that part of the indetermi- 




M. M. 

Fig. 8.— x 2 contours for fits to the full LOSVDs in all 61 
apertures of data set A; N c = 1198. N a = 8928, 5775, 2863, 1430 
in a-d respectively. 



nacy illustrated in Figure might be due to the fact that 
the data of data set A are restricted to the region near 
the black hole; hence the model kinematics are not at all 
constrained at large radii. This means that the modeling 
algorithm has unlimited freedom to vary the properties 
of the model at large radii while fitting the small radius 
data. In order to test if this is the sole cause of the inde- 
terminacy we show in Figure dhow the x 2 contours are 
modified if, in addition to data set A (kinematical data 
extending to 1.5"), the modeling algorithm is given the 
additional 44 data points (including the first 4 moments 
of the LOSVD at each point) from data set B that cor- 
respond to the ground-based WHT observations along 
all position angles (kinematical data extending out to 
8"). We see once again that when the full orbit library 
of ~ 9000 orbits is used a long flat \ 2 valley which is 
somewhat more restricted in M. results. 

As an alternative to fitting GH moments, one can fit di- 
rectly to the LOSVDs from which the GH moments were 
derived (e.g. ijMerritt 19 97)'). This procedure is expected 
to be inefficient if the LOSVDs are nearly Gaussian since 
measurements at many distinct velocities are required to 
reproduce accurate estimates of just the lowest-order GH 
moments. But direct use of the LOSVDs may be ad- 
visable near the centers of galaxies where observations 
can reveal extended wings due to high-velocity motion 
around the black hole (e.g. l) Joseph et al. 2001]) '). wings 
that are poorly represented by the lowest terms in a GH 
expansion. 

Figure [S] shows x 2 contours for fits to the full LOSVDs, 
sampled at Av ~ 40km s _1 . This velocity sampling 
is approximately equal to the velocity resolution of the 
STIS spectrograph at 8500A. (The velocity scale of the 
the STIS spectrograph at ~ 8500A is ~ 19km s _1 per 
pixel. Thus two pixels in the spectral direction (Nyquist 
sampling) imply a velocity resolution of ~ 38km s _1 ). 
A more pragmatic justification is that sampling at Av ~ 
40km s _1 already implies 1198 constraints and halv- 
ing the velocity spacing would increase the number of 
constraints to over 1800, requiring a prohibitively large 
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N/N = 1.2 





Fig. 9.— 1-D cuts through Figure for T v = 2. The unique 
minimum in x 2 that appears when the number of orbits is small, 
becomes a perfectly fiat plateau when N is large, indicating that 
the estimation of Mm from these data is under-determined. 



number of orbits for the modeling. We carried out 
optimizations for the same four sets of orbits (N a = 
8928, 5775, 2863, 1430) used to fit the GH moments in 
FigurelHl The total number of data constraints was 1198: 
the same set of 266 mass constraints as in Figure |31 and 
the LOSVDs measured at all 61 apertures along the ma- 
jor axis. The ratio N /N c is smaller than in the plots of 
Figure |3 because of the roughly three times larger num- 
ber of constraints required to represent the LOSVDs. 

In all four panels of Figure |H1 the decrease in N o /N 
relative to Figure E3 results in a slightly smaller allowed 
range of models. But once again, for a large enough 
orbit library, there is an extended region within which 
X 2 is precisely constant. For the smallest orbit library 
(N c = 1430) the true solution lies outside the minimum 
contour and the "best fit" solution is obtained for a larger 
M. and smaller T than those corresponding to the true 
solution. Figure El shows ID cuts through Figure for 
Yy = 2. The constant-x 2 plateau appears for N a /N c > 
5. 

In order to make a more reasonable comparison be- 
tween the quality of the fits to the LOSVDs and to the 
GH moments, we defined a new statistic Xkin' wmc h mea- 
sures only the goodness of fit to the kinematical data in 
each aperture i.e. V, a, /13 and /14, rather than the value 
of the objective function (which in this case includes the 
LOSVDs). (The x 2 of the fit to the mass constraints 
is also excluded from Xkin Du ^ i s < 10~ 3 everywhere). 
When 8928 orbits were used, fitting to the LOSVDs gave 
a minimum Xkin ~ 0.416, while fitting to the GH mo- 
ments gave Xkin = 0.0442. (Although there is nearly an 
order of magnitude difference in the two numbers, the 
two fits are indistinguishable to the eye and both are vir- 
tually perfect.) Thus we conclude that fitting to the GH 
moments may be adequate even when the LOSVDs have 
large wings, as in the case of our central aperture. 




'lVN a =13.2 



M. M. 

Fig. 10.— Fits to the LOSVDs sampled with Av = 40 km s" 1 
(left panel) and Av = 100 km s _1 (right panel) at all 61 apertures 
using the full library of 8928 orbits. 



Prior to the installation of STIS aboard HST, the faint 
object spectrograph (FOS) was used to observe the nu- 
clei of galaxies with high spatial resolution, although its 
velocity resolution was only ~ 100km s _1 . Due to the 
difficulties associated with reducing the FOS data, only 
a few of the galaxies observed with the FOS have been 
modeled. These includ e M32 (vdM98) and NGC 3379 
l|Gebhardt et al . 2000a). vdM98 used V/ and a x as de- 
rived from the FOS observations in their modeling of 
M32, while Gebhardt et al. (2000a) attempted to ex- 
tract the central few LOSVDs in NGC 3379, sampled 
at 100 km s _1 spacing. In their most recent paper Geb- 
hardt et al. (2003) modeled the kinematics of 12 galaxies 
with nuclear data from STIS. In all cases they sample the 
LOSVDS with only 13 points with typical velocity spac- 
ing of ~ 100km s _1 . In Figure El we compare the fits 
to LOSVDs sampled at 40 km s _1 and 100 km s _1 at 
all 61 apertures using the full orbit library of 8928 or- 
bits. This plot shows that when LOSVDs are coarsely 
sampled with Av — 100km s _1 , a much larger region 
of parameter space can fit the data equally well and the 
model parameters are not well constrained. Figure 1111 
shows ID cuts through Figure El a t Ty = 2. For the 
model closest to the "true" model (M. = 2.66, Ty = 2), 
X 2 = 0.416 and x 2 = 0.084 for Av = 40km s" 1 and 
At; = 100km s^ 1 respectively. 

From these x 2 values, one might conclude that all mod- 
els close to the bottom of the x 2 valley would give equally 
good fits. However, it is once again essential to compare 
how the kinematics would be fitted if all the informa- 
tion in the best sampled LOSVDs were used. To do this 
we use the orbital weights provided by the fits to the 
LOSVDs sampled at 40 km s _1 and 100 km s _1 but co- 
add the appropriately weighted GH moments computed 
from the orbital LOSVDs sampled at 5 km s _1 . Fig- 
ure El shows the fits the GH moments for models lying 
on the plateau of the x 2 valley with each of the two ve- 
locity spacings. It is clear that fitting coarsely-sampled 
LOSVDs gives a much poorer fit to the kinematical data, 
especially for the higher-order GH moments, e.g. /14. 
This is despite the fact that they are an almost perfect fit 
to the coarsely sampled LOSVDs! This quality of the fit 
worsens even more at points further away from the true 
model as shown by the steeply rising and highly variable 

Xki: 



values plotted in Figure El This is understandable 
since both V and a at large radii are ~ 50km s _1 roughly 
half the spacing between points in the LOSVD! 
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Fig. 11.— 1-D cuts through Figure [Tol at Ty = 2. The solid 
line is for Av = 40 km s _1 and the dashed line is for Av = 100 
km s — 1 . Arrow marks true value of M. . 
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Fig. 13.— 1-D plot of xl in for T v = 2 for fits with the two 
different velocity spacings (Av = 40km s — 1 and Av = 100km s _1 ). 
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Fig. 12. — Fits to the kinematical data (Vi , a, /13, /i-i)) for models 
from Fig. 10 with M. = 2.66, T v = 2. The solid line is the fit 
obtained with At; = 40km s _1 (x^in = 4.16 X 10 _1 ) and the dashed 
line is the fit obtained with Av = 100km s _1 (x£ in = 32.5). 



The suitability of LOSVDs sampled at ~ 100km s _1 is 
likely to be particularly bad in compact low luminosity 
ellipticals like M32 where the central velocity dispersion 
is < 150 km s _1 but may be less problematic in large 
giant ellipticals where the central velocity dispersion is 
~ 250 — 300 km s^ 1 . It is clear however that using a 
fixed number of grid points per LOSVD for all galaxies 
could produce non-uniform results. This implies that 
it is essential to tailor the modeling parameters to each 
galaxy. 

6. FITS TO DATA SET B - A 21 MODEL OF M32 

Data set B was obtained by "observing" the 21 model 
through exactly the same set of apertures, and with 
the same PSFs, as in the observations of M32 (van der 
Marel et al. 1994a; Bender et al. 1996; vdM98; Joseph 
et al. 2001) that were used to construct the 21 model 



described in section § 3. vdM98 used this same set of 
observations in building their 31 models of M32 and es- 
timating the black hole mass. Figure ITU shows that data 
set B is not a perfect match to the actual M32 data 
although it reproduces the kinematics near the central 
black hole very well. Error bars on the pseudo dataset 
were defined as described in § 

In Figure IT31 we repeat an experiment first carried out 
by vdM98 in their analysis of the actual M32 data (see 
their Appendix A). We fixed the number of orbits in 
our 31 modeling algorithm at iV G = 1982 - similar to 
the number (1960) used by those authors - and explored 
how the x 2 contours change as we apply progressively 
larger numbers of observational constraints, as follows 
(all from data set B): (a) major axis V and a in the 
WHT and CFHT apertures; (b) major axis V, a, h 3 , hi 
(WHT, CFHT); (c) major and minor axis V, a, h 3 , ha 
(WHT, CFHT); (d) V, a, h 3 and h 4 along all position 
angles (WHT, CFHT); (e) all constraints in (d) plus V 
and a from the HST/FOS apertures. Each of these fits 
included 266 meridional-plane mass constraints within 
100". 

As in the previous section none of the models discussed 
in this section were constructed with regularization con- 
straints imposed. 

Figuresll5l andlT^l show how the constraints on M. and 
T appear to tighten as the number of data points pro- 
vided to the modeling algorithm are increased. When 
only the major-axis "WHT" measurements of V and a 
are used, the potential parameters are almost uncon- 
strained, but when the entire data set is given to the 
modeling algorithm, a well-defined minimum appears in 
X 2 (M,, T) that is reasonably close to the true model pa- 
rameters. vdM98 found a similar dependence of the % 2 
contours on number of data points when modeling the 
true M32 data. 

But Figures IT7I and ITS! tell a very different story. Now 
the fits have been carried out using a fixed ratio of orbits 
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Fig. 14. — Data set B. Solid dots represent the real data to which the 21 model was fitted; solid lines represent the 21 fit; open squares 
represent the points on the fit which were selected as data set B. 



to data constraints, N a /N c ps 10. The rapid shrinking 
of the x 2 contours with increasing N c in Figure IT31 and 
ITo! is now gone: even using the full set of data gives a 
X 2 (M.) plot with an extended flat plateau, stretching 
from M, k, 2 x 10 6 A^ Q to M. w 6 x 10 6 A^ Q . The true 
value, M. = 2.625 x 10 6 .M©, lies on the edge of this 
plateau suggesting that even the large number of orbits 
we used (5856) is barely sufficient to reproduce the true 
X 2 contours. 

The most important conclusion we draw from a com- 
parison of Figures 1151 and 1171 is that the appearance of 
the x 2 contours depends strongly on the flexibility of the 
modeling algorithm. The quality of the fit to the data de- 
pends at least as strongly on the size of the orbit library 
as on the size of the data set. Comparisons between 
fits made with different sets of data are problematic un- 
less care is taken to demonstrate that the ratio N Q /N C 
is sufficiently large for each fit. And for a given data 
set, statements about the best-fit model parameters and 
their confidence intervals can be very strongly influenced 
by the number of orbits used. 

We note that including the "HST/FOS" measurements 
from data set B has almost no influence on the range 
of indeterminacy in M m ; the width of the constant-x 2 
plateau is virtually unchanged (Figure f]~5f) . This sug- 
gests that the FOS data for M32 did not significantly 
tighten the constraints on the mass of the black hole 
in this galaxy compared with the constraints set by the 



ground-based data. vdM98 reached a different conclu- 
sion; comparison of Figures H5l and IT7I suggests that they 
may have been misled by the relatively small number of 
orbits in their modeling algorithm. 

It is interesting to compare these results with 
those obtained using only the subset of data set 
B corresponding to the ground-based, WHT data; 
these (simulated) data have an effective resolution 
FWHM/2r^ w 0.5, better than that of most galaxie s 
observed with HST/ST IS ( ijMerritt fc Ferrarese 2001b|) : 
ijGebhardt et al. 2003(1 ) and their spatial coverage and 
S/N are much greater than those of most STIS nucleus 
data. Thus we extracted from data set B measurements 
at all the WHT apertures of V, a, /13 and /14, includ- 
ing all position angles (430 constraints). Figures IT^l and 
1201 show the results, for three different numbers of or- 
bits, iV = (1982,5674,8352). When the ratio of orbits 
to constraints is largest {N a /N c = 19.4 for N a = 8352), 
excellent fits are obtained for any black hole mass in the 
range 1 x 10 6 7W Q < M. < 10 x 10 6 7W Q ! While there is 
a hint of a minimum at M. w 4.5 x lO 6 ^©, it is well 
removed from the true value of M m and furthermore its 
location is very sensitive to N a . We conclude that these 
data are almost useless for constraining the black hole 
mass. We would expect a similar or greater degree of 
indeterminacy in values of M, derived from many of the 
galactic nuclei observed with HST/STIS. We return to 
this point in § 8. 
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Fig. 15. — Contour plots of the \ 2 that measures the qual- 
ity of the fit to various subsets of data set B (the simulated M32 
data) using 1982 orbits, (a) Major axis V and a (WHT, CFHT 
apertures); (b) major axis V, cr, hg, (WHT, CFHT apertures); 

(c) major and minor axis V, <r, /13, /14 (WHT, CFHT apertures); 

(d) V, a, h 3 , hi along all PAs (WHT, CFHT apertures); (e) all 
of the constraints in (d) plus V and a from the FOS apertures. 
The * labels the true model parameters. N c is the total number of 
constraints including mass constraints. This plot seems to suggest 
that the constraints on M, and T become rapidly tighter as the 
number of data points increases. 
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Fig. 16. — 1-D cuts through Figure 1151 at Tv = 2. Arrow 
indicates true value of M, . 




Fig. 17. — Same as Figure [T51 except that the size of the orbit 
library in each panel has been adjusted such that N /N c is constant 
at ~ 10. The x 2 contours now change much more slowly as the 
number of data points is increased, and even for the full data set, 
the constraints on M, and T are weak. 




M, 

Fig. 18. — 1-D cuts through FigurclTTIat Tv = 2. Even using the 
full data set (N c = 584), there is a plateau of constant \ 2 indicating 
that these simulated data do not uniquely constrain M,. 



Finally we ask if the constraints on M, using this data 
set can be narrowed by adding the simulated HST/STIS 
data. Figure |2*T1 shows \ 2 contours for 31 model hts to 
data set B including all the STIS apertures as well as the 
ground based and FOS apertures. The data were fitted 
at 140 apertures in total (the four outermost STIS data 



points on one side of the galaxy were the only apertures 
in the Joseph et al. (2001) data set that were not fit- 
ted) . Figure [23 shows 1-D cuts through the x 2 -contour 
plot at T = 2. This figure suggests that the addition of 
the STIS data to the existing data for M32 may yield a 
tight constraint on M t : even for the largest orbit library, 
the allowed range of solutions is quite small. We note 
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Fig. 19. — Fits to the subset of data set B corresponding to the 
ground-based, WHT apertures only, for various numbers of orbits; 
N c = 430. 
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Fig. 20.— 1-D cuts through Figure [151 at TV = 2. These data, 
which are superior in quality to most STIS /HST nuclear data, place 
only very weak constraints on M,. 



also that the true solution lies close to the center of the 
minimum in the x 2 valley. 

Figure l2~TT a'l should be considered provisional since the 
ratio of orbits to constraints is ~ 10 and likely to be 
marginally adequate. We will return to this point in a 
later paper when we analyze the observed STIS data for 
M32. 

7. DATA SET C: M32 RE-EXAMINED 

In § 5 and 6 we presented x 2 -plots of fits to two simu- 
lated data sets derived from a model that was based on 
data from M32. Here we show the results of fits to the 
actual data used in the construction of that model, our 




Fig. 21. — Contour plots of \ 2 f° r models constructed to fit data 
set B including HST/STIS apertures as well. The same number of 
constraints (810) are fitted in each panel but the number of orbits 
in the library is varied as indicated on the plots. 
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Fig. 22. — 1-D cuts through Fig. |^ This figure shows that 
for the largest orbit library, the minimum in the x 2 valley is still 
reasonably narrow, suggesting that the HST/STIS data for M32 
may yield tight constraints on M. in this galaxy. 



data set C. These are the same data used by vdM98 in 
their 31 study of M32. The constraints in our data set C 
include meridional plane masses in 266 cells. In the mod- 
eling results presented below, mass constraints were fit to 
an accuracy of better than 3% everywhere and to better 
than 0.1% at all points within the minimum x 2 -valley. 

The models discussed in this section were constructed 
without regularization constraints. The same is true of 
the vdM98 modeling of M32 with which we make com- 
parisons. Our conclusions about the robustness of those 
authors' results with regard to size of orbit library are 
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Fig. 23. — Contour plots of the x 2 that measures the quality of 
the fit to the combined FOS, CFHT and WHT data sets for M32. 
These are the same data fitted by van der Marel (1998) in their 31 
modeling study. M32 is assumed to be edge-on. The four panels 
show the results using four different sizes of orbit library. Model 
parameters are the black hole mass Al. in 10 6 .M© and the V-band 
mass-to-light ratio T in solar units. Dots indicate models that were 
calculated. Labelled positions are models whose fit to the data is 
illustrated in detail in Figure |2*B1 The upper right panel is based 
on the same number of orbits as in van der Marel et al. (1998) and 
shows two distinct x 2 minima, as in their paper. As the number 
of orbits is increased, these two minima merge and broaden into a 
plateau of constant \ 2 - 




Fig. 24. — 1-D x 2 plots along the dotted lines in Figure l23l (a) 
Horizontal cut; (b) slanted cut. The four lines in each plot corre- 
spond to the four different numbe rs o f orbits used in the modeling, 
increasing downward (cf . Figure 1231 . These plots show that the 
local minima appearing for small N a disappear as N is increased, 
yielding a region of nearly constant x 2 stretching at least from 
~ 1 x 10 6 A4 Q to ~ 6 x 10 6 A4 Q . 



Table 1 

x 2 of fit to individual datasets. 



Model 


M. 


Xy 


"X-YOS 


*CFHT 


XwHT 


X 2 


A 


1.4 


2.8 


22.45 


49.24 


39.55 


111.19 


13 


2.4 


2.4 


22.88 


49.32 


28.36 


100.56 


C 


3.3 


2.2 


25.17 


51.33 


26.51 


103.01 


D 


1.0 


2.0 


30.96 


46.97 


28.11 


106.04 


E 


4.5 


1.8 


32.88 


49.38 


27.30 


109.56 


F 


4.8 


1.6 


33.90 


50.20 


25.88 


109.98 



therefore unaffected by questions of regularization. 

Figure [23] shows the results of fitting the full 
data set using four different orbit numbers, N Q = 
(665,1999,5127,8352), or N D /N C = (1.0,3.0,7.6,12.4). 
The top-right-hand panel of Figure 1231 was made using 
almost exactly the same number of orbits as in vdM98. 
This plot exhibits two minima in x 2 (T,Af.); the corre- 
sponding plot in vdM98 (their Fig. 6) also shows two 
minima, although displaced slightly from the two in our 
Figure[2Sb- However as N a is increased, we find that the 
two minima merge into a single, broad plateau. 

In their edge-on modeling of M32 from these data, 
vdM98 selected a model with M. = 3.4 x 10 6 .M© and 
T k, 2 as their best fit. That model lies between the two 
minima seen in the upper-right panel of Figure 1231 and 
somewhere near the center of the constant-;^ 2 plateau in 
the lower panels. It is important to note that the approx- 
imate agreement with the results of vdM98 is a valuable 
test of our code. The fact that the contours of vdM98 are 
not reproduced exactly are a result of discretization and 
differences in the details of the modeling which tend to 
be more obvious since the solutions depend on the small 
numbers of orbits used (~ 2000) . 

In Figure we present cuts along two axes in the 
X 2 (T,M.) plots (indicated by the dotted lines in Fig- 
ure I23[) . One cut is at T = 2 and the other cut lies 
roughly along the minimum of the x 2 -valley. These fig- 
ures demonstrate that for the largest values of N a /N c no 
preferred value for M, in M32 can be found over a range 
in values that extends at least from ~ 1.5 x 10 6 A^q to 



- 5 x 1O 6 A^ . 

We demonstrate the indeterminacy even more clearly 
in Figure which shows detailed fits to the kinematics 
for a set of models lying along the x 2 plateau in Figurel2*3l 
The differences between the various models - which span 
a range of almost a factor of four in M, - are almost in- 
visible, with the exception of the predicted values of a in 
the FOS apertures. This could be interpreted to mean 
that the FOS data contain useful information about M., 
but we consider this unlikely, since none of the models 
fits the FOS data well, due to the large point-to-point 
fluctuations in the FOS velocity dispersions. It is en- 
tirely possible that smoother data, with the same spatial 
resolution as the FOS data, could have been fit well by 
all the models in this set. 2 

In order to make the case for indeterminacy in M m even 
more airtight, we present in Tabled the contribution to 
X 2 from each of the three partial datasets (FOS, CFHT, 
WHT) that make up our data set C. The values of Xcfht 
appear to fluctuate randomly from model A through F 
with no systematic behavior. By contrast, Xfos * s sma ll~ 
est for Models B and A and increases steadily toward 
larger M,. The opposite trend is observed for the fit to 
the WHT data; as a result, the total % 2 remains almost 
precisely constant. (Mass constraints contribute almost 
nothing to the total % 2 since they are fitted to better 

2 Preliminary results of modeling the M32 STIS data of Joseph 
et al. (2001) show that these data can be fit well over a finite range 
in M m , without the variations apparent here in the fits to the FOS 
data (Valluri et al. 2004- in preparation). 
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Fig. 25. — Predictions of selected models from Figure |23 compared with a subset of the M32 data. Models A-F have black hole masses 
ranging from 1.4 X IO^Mq (Model A) to 4.8 X 10®Mq (Model F). All plots show fits to major-axis data; however note that data along 
other position angles were also used in constructing the models and the \ 2 values plotted in Figure |2~31 include the full data. There is very 
little difference in the quality of fit of these models to the CFHT and WHT data. The only visual difference is between the fits to the FOS 
data. The wildly varying velocity dispersions for the FOS data make this harder to fit. These plots along with Table 1, show that black 
holes with masses in the range 1.4 X IQPMq < Mm < 4.8 X 10 s A4q are equally consistent with the data. 



than 0.1% accuracy at all points within the minimum 
X 2 -valley.) Although the lower values of M, provide the 
lowest Xfqs they require too large a value of T to fit 
the large radius data. However the relative difference 
between models A and B or F and B is statistically in- 
significant. 

The internal kinematics of our models must vary with 
ill. and T in order to maintain fixed observables. Fig- 
ure 1261 shows plots of the major axis, internal velocity 
dispersion components for models A through F. The be- 
havior is precisely as expected: near the center, models 
with lower M, maintain a high central velocity disper- 
sion by putting the largest fraction of stars on radial 
orbits; at high M., the central line-of-sight dispersion is 
maintained by transferring more and more stars to nearly 
circular orbits around the black hole. 

8. THE EFFECT OF ADDING REGULARIZATION 
CONSTRAINTS 

In order to test the effect on the solutions of adding 
smoothness constraints, we ran a series of models to 
fit the full set of kincmatical and mass constraints for 
pseudo data set B, with various values of the smoothing 




0.5 1 1.5 2 0.5 1 1.5 2 

log (r) log (r) 



Fig. 26. — Intrinsic 3-D kinematics along the major axis for each 
of the models A-F. Black hole masses ranging from 1.4 X 10 6 A4q 
(Model A) to 4.8 x 1O 6 A4 (Model F). 



parameter A in equation (|14|) . In this set of runs, the 
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errors on the data were selected to be the same as those 
of the real data. Since regularization is computationally 
expensive, we ran this series of models with only 5000 
orbits instead of the full orbit library. 

Each plot in Figure """71 shows x 2 P er constraint versus 
M. for TV = 2. For each choice of (M.,T V = 2), the 
average level II of noise in the solution was computed 
via equation Ijl7jl in § 12. 41 and the value of II is indicated 
by the height of the solid bar at each point. Since II 
varied by more than four orders of magnitude as A was 
varied, the height of the bar has been rescaled in each 
plot as (IL. — n m , n )/n m j n . In addition the plots give 
the quantities scale = (Tl m ax — n m j„)/II rn j„ and II, the 
mean of all the noise values in a given plot. Arrows in 
each plot mark the position of the true value of M. . 

The primary conclusion to be drawn from Figure [2*71 is 
that adding regularization constraints does not suddenly 
or dramatically reduce the degeneracy in M,. Although 
the mean level of noise (II) drops by a factor of ~ 500 as 
A increases from A = 10~ 8 — 10~ 2 , the flat x 2 plateau per- 
sists over this range with negligible decrease in the width 
of the plateau. There is no indication that the algorithm 
achieves good fits for incorrect values of M. by select- 
ing spuriously noisy solutions. Indeed, the noise level 
is roughly constant along the constant-x 2 plateaus, and 
rises sharply only outside; we provisionally interpret this 
to mean that all solutions along the plateau are "equally 
good" and that the algorithm does not need to construct 
highly artificial solutions in order to achieve its good fits. 
As A is increased beyond ~ 1, the extended plateau is 
replaced by a true minimum in x 2 ; this is a necessary 
consequence of the smoothness constraint, which begins 
to penalize solutions characterized by sharp phase-space 
gradients, even if they reproduce the data. However, the 
regularization does not seem to have any special ability 
to select out the correct M m : instead, as A is increased, 
the best-fit M, systematically drops. This is also ex- 
pected, since there is no reason for the true solution to 
also be the "most smooth" as defined by any particular 
choice of penalty function. 

Figure |2H1 shows how the best-fit value of M, and the 
range of acceptable M, values varies with the level of 
imposed smoothing. We defined the range in M. by 
Ax 2 = 0.5. For small values of A, there is no well-defined 
minimum in x 2 an d we defined the best-fit value as the 
value of M. at the center of the Ax 2 = 0.5 region. While 
there does exist a particular value of A (A « 1) for which 
the best-fit M, is close to the input value, there would 
seem to be no way to guess this value based only on a 
plot like Figure [2*51 When log A is increased just slightly 
above this value, the best-fit M. drops below its true 
value as the smoothness constraint begins to bias the so- 
lution toward overly-smooth phase space distributions. 
In other words, the optimal value of A is only slightly 
smaller than the value at which the solutions begin to be 
seriously biased. 

Several authors have based their choice of an optimal 
smoothing parameter on the ability of their algorithm to 
reproduce a specific 21 distribution function (e.g. Ger- 
hardt et al. 1998, Cretton et al. 1999, Verolme & de 
Zeeuw 2002). There are two potential problems with this 
approach: first it has been shown that even for a known 
distribution function, the optimal value of A depends on 



the choice of data set (e.g. Cretton et al. 1999); second 
this choice of A is not guaranteed to give the underlying 
distribution function - but just one of the 31 distribution 
functions that is close to a 21 form. This is unlikely to be 
useful in the general case where the distribution function 
could deviate significantly from the 21 form. One might 
be particularly concerned about its applicability in mod- 
eling integral field data for galaxies with significant non- 
21 features: (counter rotating disks, cores etc). In such 
cases the use of a smoothing parameter optimized to re- 
cover a 21 distribution function could artificially restrict 
the models. 

In Figure "2HI we plot full x 2 plots for 4 values 
of log(A). The contours trace the value of Ax 2 = 
(x 2 — Xmin)- The first two contours are at A^ 2 
0.5. I . Subsequent contours are at spacings of A^ 2 = 
2.3,4.6,6.2,9.2,11.8,18.4, which are the 68.3%, 90%, 
95.4%, 99% 99.7 3% and 99.99% con fidence regions on M. 
and Ty jointly ijPress et al. 1992|) . The grey scale rep- 
resents the noise. In each plot white represents the least 
noisy model and black represents the most noisy model. 
As in the case of the 1-D x 2 plots it is evident that an 
elongated x 2 minimum persists in both parameters even 
with moderate to high levels of smoothing. Once again, 
noise values do not appear to vary much within the min- 
imum valley and are comparably low through the entire 
Ax 2 = 0.5 contour. There is also no indication that 
the models at the extremes of the contours are signifi- 
cantly more noisy than the models at the center. The 
regions with the largest amount of noise are also the re- 
gions where the x 2 values are very large. Interestingly 
for models at the top right of the plot, the x 2 values are 
large but the models have little noise. There appears to 
be little if any correlation between the noise levels and 
the distance from the x 2 valley. 

These experiments are consistent with the view 
that the potential estimation problem is inherently ill- 
conditioned, and that regularization while it can artifi- 
cially reduce the solution space can not overcome the 
degeneracy. We note a subtle but important distinction 
here between the role of smoothing in 21 and 31 modeling. 
In the 21 case, there does exist a unique (smooth) / corre- 
sponding to any assumed potential, and the imposition of 
smoothing constraints might be expected to assist in th e 
recovery of that unique f ( 'Merritt fc Fridman 1996» : 
(iJalali fc de Zeeuw 2002Tk'yerolme fc de Zeeuw 2002f) : 
( Cretton & Emsellem 2003)). Something similar must 
happen in the 31 case, but as our experiments show, the 
additional freedom associated with a 31 / allows many 
potentials to be fit with orbital-space populations that 
are comparably smooth. This is just what one expects 
when solving an under-determined problem: smoothing 
alone can not overcome the degeneracy. Furthermore, as 
Figure EH shows, there is a real danger associated with 
regularization: if the smoothing parameter is too large, 
the best-fit M. is biased. Indeed this figure suggests that 
any A large enough to give a "best-fit" M, - i.e. a unique 
minimum in x 2 ~ is also large enough to bias the loca- 
tion of that minimum. Hence the conservative approach 
to modeling would be to use little or no regularization, 
even if doing so means that no "best-fit" potential pa- 
rameters will be found. 

Finally, we note here a formal similarity between the 
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Fig. 27. — Plots of \ 2 versus M. and for T\ 



2 for a sequence of smoothing parameters (A). The quantities n and scale are 
defined in the text. In each plot the arrow marks the position of the "true value of M," . At each data point the height of the bar is 
oc (II - n min )/(n max - II min ) for the model. 




-10 -8 -6 -4 -2 

log(A) 

Fig. 28. — Variation of the mid point of the x valley as a 
function log(A). Error bars indicate the range within which Ax 2 = 
0.5. The broken line represents the "true" value of M, for the 
pseudo data. 




Fig. 29. — 2-D \ 2 contour plots for 3 different values of smooth- 
ing parameter log(A) as indicated. The contours represent values 
of Ax 2 = x 2 — Xmi n with the minimum contour at Ax 2 = 0.5. Sub- 
sequent contours are at intervals of Ax 2 as indicated in the text. 
The grey scale represents the noise. In each plot white represents 
the least noisy model and black represents the most noisy model. 
As before the star indicates the position of the "true model" . 



various sorts of "constraint" that can be imposed on /. 
Forcing / to depend only on E and L z ; restricting the 
number of orbits from which a 31 / is constructed; or 
forcing / to be smooth, via some regularization scheme, 



all have the effect of artificially removing the degener- 
acy in the potential estimation problem, and converting 
flat x 2 contours into contours with a unique minimum. 
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We consider each of these approaches to be dangerous. 
While any combination of $ and / that fits the data 
is acceptable, statements about the range of acceptable 
potentials depend critically on how flexible the modeling 
algorithm is at representing different forms of /. This 
consideration is particularly important when attempting 
to estimate the value of M, in galactic nuclei, since ar- 
tificially restricting / may give the mistaken impression 
that a particular value of M, is preferred, when in a fact 
a model with M, = can fit the data equally well. 

9. DISCUSSION 

We have shown that the results obtained from stel- 
lar dynamical modeling of galaxy centers can depend as 
strongly on the flexibility of the modeling algorithm as on 
the number and nature of the observational constraints. 
Estimation of the parameters M, (black hole mass) and 
T (stellar mass to light ratio) that define the gravita- 
tional potential is typically an under-determined (de- 
generate) problem, and 31 (three-integral) modeling can 
(and, we believe, often does) generate spurious, "best- 
fit" model parameters that bear no special relation to 
the true parameters. We demonstrated this in the case 
of previously-modeled data for M32: increasing the num- 
ber of orbits by a factor of ~ 4 above what was used in 
the earlier studies led us to substantially different conclu- 
sions about the most likely value of the black hole mass 
and its uncertainty in this galaxy. Indeed, we found that 
no single value of M, was preferred and that values for 
M. in the range 1.5 x 10 6 7W Q to 5 x 10 6 7U q could repro- 
duce the data with no appreciable change in the goodness 
of fit (Figures I23I25|1 . We argued that the degeneracy is 
not a numerical artifact, but derives instead from the 
great freedom available to fit a given, oblate-spheroidal 
mass distribution using a 31 distribution function. 

Our work raises three questions about published and 
ongoing modeling studies of galactic nuclei. 

1. Does a given data set contain enough information 
to distinguish a best-fit value of M., or is M, in- 
determinate, and if so, over what range of values? 

2. How can one be certain that a given modeling al- 
gorithm accurately reproduces the true interval of 
(M m> T) values allowed by the data? 

3. What is the role of regularization (via maximum 
entropy or any other scheme) in reducing the solu- 
tion space? 

With regard to the first question, we showed that the 
degeneracy in M, is substantial even for one of the best 
available data sets, the pre-STIS data for M32 (vdM98). 
These data resolve the sphere of influence of the black 
hole (FWHM/2r, l » 0.25 assuming M. « 3 X 10 6 .M Q ); 
include Gauss-Hermite moments up to he; extend out- 
ward to ~ 10r^ along several position angles; and have a 
high signal to noise ratio. Furthermore, by virtue of its 
high rotation, M32 allows tighter constraints to be placed 
on the orbital distribution and on M m than in "hotter" 
stellar systems. Nevertheless our constraints on M, were 
weak, spanning a factor of ~ 3.5. 



We expect the degree of degeneracy in quantities like 
M, to depend on the quality of the data, in particu- 
lar on the degree to which the black hole's sphere of 
influence is resolved. We demonstrated this explicitly 
in §5-6 using our simulated and real data sets: placing 
useful constraints on M. requires an effective resolution 
FWHM/2r/ 1 that is substantially less than one. Another 
relevant factor is the radial extent of the data, which de- 
termines how well the mass to light ratio is constrained 
(cf. Figure 0. 

With regard to the second question, we showed that 
X 2 contours for the simulated and true M32 data sets 
change significantly when the number of orbits used in- 
creased from twice to in excess of 10 times the total num- 
ber of data points (kinematical plus mass constraints). 
Fully general statements about the minimum number of 
orbits required to explore the full extent of the allowed 
solution space are impossible to make, since some data 
points are clearly more useful than others for constrain- 
ing quantities like M.. For instance, we argued (§ 5) 
that direct fitting to LOSVDs is less efficient than fitting 
to Gauss-Hermite moments, in the sense that more or- 
bits are required in the former case to achieve the same 
degree of modeling flexibility. In one data set treated 
above (the simulated WHT data, Figures and EOJ , 
the shape of the x 2 contours continued to change as the 
ratio of orbits to constraints was increased from ~ 10 to 
~ 20 and it is conceivable that even more orbits would 
be required to reproduce the true extent of the indeter- 
minacy in M, . A number of different factors are likely to 
influence the minimum number of orbits required to con- 
strain M, and the distribution function of the surround- 
ing spheroid: the type of galaxy, the quality, nature and 
dimensionality of the data (spatial and spectral resolu- 
tion, single slit, multiple slits, 2D spatial coverage), the 
true internal kinematics of the galaxy, etc. 

The importance of 2D coverage to constrain 31 distri- 
bution functions has been recently illustrated by mod- 
eling studies based on data from integral field spectro- 
graphs such as SAURON on the WHT. Verolme et al. 
(2002) presented 31 modeling of M32 based on data from 
the SAURON as well as HST/STIS data from Joseph et 
al. (2001). It is likely that 2D data such as those pre- 
sented by Verolme et al. (2002) are able to greatly reduce 
the degeneracy which we demonstrated in single- or mul- 
tiple slit data sets. They showed for instance that 2D 
data are significantly better at constraining the mass- 
to-light ratio T, than are multiple long slits. However 
the number of data constraints modeled by them was 
~ 8000 and the number of orbits used was only 1960, 
giving N Q /N C < 0.25. Mathematically a unique solution 
will always be found if N a < N c . The well-defined min- 
ima in their x 2 (T,Af.) plots could be a consequence of 
the smaller ratio of orbits to constraints, or could mean 
that their data have overcome the degeneracy. Testing 
the latter hypothesis will be difficult however given the 
large number of orbits (N a > 10 5 ) that would be re- 
quired. 

As pointed out in § El it has not been demonstrated 
mathematically that it is possible to construct a unique 
31 distribution function from projected kinematical, 
surface brightness data no matter how perfectly the 
LOSVDs are sampled. It is often stated (e.g. Cappellari 
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et al. 2003b) that 2D kinematical coverage is essential 
to constrain the orbital structure in a galaxy from ob- 
servables. Thus it would appear that limited data (e.g. 
slits along one or more axes) are guaranteed to be insuf- 
ficient. Most published estimates of black hole masses 
are based on multiple slit data and are therefore likely to 
suffer from this indeterminacy. 

If the potential estimation problem is generically 
under-determined, why has the degeneracy escaped gen- 
eral notice until now? In fact the degeneracy is appar- 
ent in a number of published modelling studies. Two 
examples are the Gebhardt et al. (2000a) study of 
NGC 3379 and the Cretton & van den Bosch (1999) study 
of NGC 4342. In the former study, the modeling used 
6400 orbits compared with 702 kinematical constraints 
and 100 mass constraints, or N a /N c — 8.0. Goodness-of- 
fit contours generated from 31 models show a plateau of 
nearly-constant x 2 extending from ~ IQ & M.q to at least 
~ lO 8 7Vi0 (their Fig. 7). In fact the authors state that 
"the difference between the no-black hole and black hole 
models is so subtle that one can barely discriminate those 
models" (cf. their Fig. 11). Gebhardt et al. nevertheless 
argue for an (inclination-dependent) best-fit value of M. 
based on the (puzzlingly asymmetric) wings of the cen- 
tral LOSVD as derived from FOS data. In the Cretton 
& van den Bosch (1999) study of NGC 4342, the authors 
again found that a model with no black hole provides 
"fits to the actual data [that] look almost indistinguish- 
able" from that of a model with M, = 3.6 x 10 8 M Q , 
the claimed best-fit value (cf. their Fig. 8). This study 
used 1400 orbits compared with ~ 250 constraints, or 
N /N c 5.6. We note that both of these data sets have 
FWHM/2r,i ~ 0.5 (if M. is computed from the M. - a 
relation), consistent, according to our analysis, with the 
fact that no best-fit value of M, could be found. 

Since about 1999, it has been common practice in 31 
modeling to include smoothness constraints on the or- 
bital weights, in the form of maximum entropy or some 
other regularization scheme (e.g. Cretton et al. 1999, 
Gebhardt et al. 2000a; Verolme et al. 2002). We showed 
above that imposing smoothness constraints has effects 
similar to those of other, ad hoc restrictions on the form 
of /: they reduce the flexibility of the modeling algorithm 
to adjust / in response to changes in <&, and therefore 
tend to "select out" a particular $ as preferred. When 
the smoothing is kept at a level too low to bias /, the 
X 2 contours on M, show the correct, perfectly-flat form 
associated with ill-conditioned estimation problems. We 
believe that it would be appropriate to repeat a number 
of the published modeling studies, giving careful atten- 
tion to the role of regularization on the range of allowed 
solutions. 

Standard practices for estimating and describing con- 
fidence intervals will need to be changed when deal- 
ing with indeterminate problems like the estimation of 
M, in galactic nuclei. Quoting a black hole mass as 
5.0 ± 2 x 10 8 .M©, for instance, is inappropriate if there 
is no best-fit value. Instead, a notation like M, = 
(3 — 7) x 1O 8 .M0 would more correctly convey the re- 
sult that any value in the specified range is equally likely. 
In addition, when using estimated black hole masses as 
data points in other statistical studies, care will have to 
be taken to deal correctly with the degeneracy. For in- 



stance, standard least-squares fitting assumes that there 
exists a best estimate of the measured quantities and that 
the errors about that estimate are normally distributed. 
Both assumptions are incorrect when the measured quan- 
tity is indeterminate. 

An important motivation for measuring black hole 
masses in galactic nuclei is to refine and extend the 
M t — a relation. Past discussions of the uncertainties 
in that relation have focus ed on statistical techniques 
ijMerritt fc Ferrarese 2001a[) or on systematic differences 
in the definitions of a l)Tremaine et al. 20 02). We sug- 
gest that the largest source of uncertainty in the M, — a 
relation is likely to be the degeneracy in SBH masses as 
determined from stellar kinematical data. 

10. CONCLUSIONS 

1. The axisymmetric potential estimation problem is 
generically under-determined: a range of values for 
quantities like M t , the black hole mass, and T, 
the mass-to-light ratio of the stars, can generally 
be found that are equally consistent with the ob- 
served kinematics. The indeterminacy arises from 
the large number of distinct distribution functions 
/ that can reproduce a given mass model. 

2. The indeterminacy becomes apparent only when 
the modeling algorithm is flexible enough to repre- 
sent a wide range of stellar distribution functions. 
In practice, this means having a sufficient number 
of distinct orbits or phase-space cells. When the 
orbit library is too small, spurious minima appear 
in plots like % 2 (M») due to the algorithm's inabil- 
ity to reproduce certain orbital populations as well 
as others. 

3. When the LOSVDs are well sampled, there is no 
advantage to fitting the full LOSVD over fitting 
just the GH moments, even when they have large 
wings. The only exceptions are likely to be when 
LOSVDs are multimodal. 

4. A re-analysis of data for M32 published prior to 
2000 reveals that these data do not imply a pre- 
ferred or best-fit value for the black hole mass, con- 
trary to claims made in the literature. We show 
that a range of values, 1.5 x 10 6 M Q < M. < 
5 x 10 6 A / Jq, are equally consistent with these data. 
We demonstrate that the best-fit values of M, in 
M32 derived in earlier studies are likely to have 
been biased by the use of too few orbits to repre- 
sent /. 

5. Regularization reduces the range of acceptable 
models, but we find no indication that the true 
potential can be recovered simply by enforcing 
smoothness. For a given smoothing level, all so- 
lutions in the minimum-x 2 valley exhibit similar 
levels of noise; as the smoothing is increased, there 
is a systematic shift in the midpoint of the \ 2 va l- 
ley, until at a high level of smoothing the solution 
is biased with respect to the true solution. 
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